A new duration modeling approach for Mandarin speech

نویسندگان

  • Sin-Horng Chen
  • Wen-Hsing Lai
  • Yih-Ru Wang
چکیده

In this paper, a new duration modeling approach for Mandarin speech is proposed. It explicitly takes several major affecting factors as multiplicative companding factors (CFs) and estimates all model parameters by an EM algorithm. Besides, the three basic Tone 3 patterns (i.e., full tone, half tone and sandhi tone) are also properly considered via using three different CFs to separate their affections on syllable duration. Experimental results showed that the variance of the syllable duration was greatly reduced from 180.17 to 2.52 frame2 (1 frame =5 ms) by the syllable duration modeling to eliminate effects from those affecting factors. Moreover, the estimated CFs of those affecting factors agreed well to our prior linguistic knowledge. Two extensions of the duration modeling method are also performed. One is the use of the same technique to model initial and final durations. The other is to replace the multiplicative model with an additive one. Lastly, a preliminary study of applying the proposed model to predict syllable duration for TTS is also performed. Experimental results showed that it outperformed the conventional regressive prediction method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel syllable duration modeling approach for Mandarin speech

In this paper, a novel syllable duration modeling approach for Mandarin speech is proposed. It explicitly takes several main affecting factors as multiplicative companding parameters and estimates all model parameters by an EM algorithm. Experimental results showed that the variance of the observed syllable duration was greatly reduced from 183.4 frame (1 frame = 5 ms) to 18.5 frame by eliminat...

متن کامل

Duration modeling and memory optimization in a Mandarin TTS system

Current speech synthesis efforts, both in research and in applications, are dominated by methods based on concatenation of spoken units. New progress in the concatenative text-to-speech (TTS) technology can be made mainly from two directions, either by reducing the memory footprint to integrate the system into embedded system, or by improving the synthesized speech quality in terms of intelligi...

متن کامل

Modeling Duration and Tonal Coarticulation in a Mandarin Chinesese Synthesis

We present in this paper the results of a duration study and a tonal coarticulation study designed for the concatenative Mandarin Chinese synthesis system developed at the Dresden University of Technology. It is reported that the duration model and the tonal coarticulation model are the two most important components of the prosody control in Mandarin. The material for the study of the two proso...

متن کامل

Modeling Duration and Intonation in Mandarin Chinese Synthesis with a Neural Network

The prosody control plays an important role in the naturalness of synthesized speech. In previous work, great efforts have been made to generate rule-based or parameter-based prosodic models [6]. In order to capture the complex interaction of different relevant prosodic factors, neural networks were recently employed. This paper presents a new method of learning and modeling duration and intona...

متن کامل

Improved generation of prosodic features in HMM-based Mandarin speech synthesis

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Speech and Audio Processing

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2003